Page 1 of 4

2023 Conference article Open Access

Social and hUman ceNtered XR
Vairo C., Callieri M., Carrara F., Cignoni P., Di Benedetto M., Gennaro C., Giorgi D., Palma G., Vadicamo L., Amato G.
The Social and hUman ceNtered XR (SUN) project is focused on developing eXtended Reality (XR) solutions that integrate the physical and virtual world in a way that is convincing from a human and social perspective. In this paper, we outline the limitations that the SUN project aims to overcome, including the lack of scalable and cost-effective solutions for developing XR applications, limited solutions for mixing the virtual and physical environment, and barriers related to resource limitations of end-user devices. We also propose solutions to these limitations, including using artificial intelligence, computer vision, and sensor analysis to incrementally learn the visual and physical properties of real objects and generate convincing digital twins in the virtual environment. Additionally, the SUN project aims to provide wearable sensors and haptic interfaces to enhance natural interaction with the virtual environment and advanced solutions for user interaction. Finally, we describe three real-life scenarios in which we aim to demonstrate the proposed solutions.Source: Ital-IA 2023 - Workshop su AI per l'industria, Pisa, Italy, 29-31/05/2023

See at: ceur-ws.org Open Access | ISTI Repository | ISTI Repository | CNR ExploRA

2023 Report Unknown

SUN D1.1 - Management Website
Amato G., Bolettieri P., Gennaro C., Vadicamo L., Vairo C.
Report describing the online web accessible repository for all project-related documentation, which serves as the primary means for project partners to manage and share documents of the project. https://wiki.sun-xr-project.euSource: ISTI Project Report, SUN, D1.1, 2023

See at: CNR ExploRA

2023 Journal article Open Access

Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS
Lokoc J., Andreadis S., Bailer W., Duane A., Gurrin C., Ma Z., Messina N., Nguyen T. N., Peska L., Rossetto L., Sauter L., Schall K., Schoeffmann K., Khan O. S., Spiess F., Vadicamo L., Vrochidis S.
This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced.Source: Multimedia systems (2023). doi:10.1007/s00530-023-01143-5
DOI: 10.1007/s00530-023-01143-5
Project(s): AI4Media via OpenAIRE

, XRECO

Metrics:

See at: ISTI Repository Open Access | ZENODO | link.springer.com Restricted | CNR ExploRA

2023 Conference article Open Access

VISIONE: a large-scale video retrieval system with advanced search functionalities
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
VISIONE is a large-scale video retrieval system that integrates multiple search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. The system leverages cutting-edge AI technology for visual analysis and advanced indexing techniques to ensure scalability. As demonstrated by its runner-up position in the 2023 Video Browser Showdown competition, VISIONE effectively integrates these capabilities to provide a comprehensive video retrieval solution. A system demo is available online, showcasing its capabilities on over 2300 hours of diverse video content (V3C1+V3C2 dataset) and 12 hours of highly redundant content (Marine dataset). The demo can be accessed at https://visione.isti.cnr.itSource: ICMR '23: International Conference on Multimedia Retrieval, pp. 649–653, Thessaloniki, Greece, 12-15/06/2023
DOI: 10.1145/3591106.3592226
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2023 Conference article Open Access

VISIONE at Video Browser Showdown 2023
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
In this paper, we present the fourth release of VISIONE, a tool for fast and effective video search on a large-scale dataset. It includes several search functionalities like text search, object and color-based search, semantic and visual similarity search, and temporal search. VISIONE uses ad-hoc textual encoding for indexing and searching video content, and it exploits a full-text search engine as search backend. In this new version of the system, we introduced some changes both to the current search techniques and to the user interface.Source: MMM 2023 - 29th International Conference on Multi Media Modeling, pp. 615–621, Bergen, Norway, 9-12/01/2023
DOI: 10.1007/978-3-031-27077-2_48
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ZENODO | CNR ExploRA

2023 Conference article Open Access

AIMH Lab 2022 activities for Vision
Ciampi L., Amato G., Bolettieri P., Carrara F., Di Benedetto M., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
The explosion of smartphones and cameras has led to a vast production of multimedia data. Consequently, Artificial Intelligence-based tools for automatically understanding and exploring these data have recently gained much attention. In this short paper, we report some activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR, tackling some challenges in the field of Computer Vision for the automatic understanding of visual data and for novel interactive tools aimed at multimedia data exploration. Specifically, we provide innovative solutions based on Deep Learning techniques carrying out typical vision tasks such as object detection and visual counting, with particular emphasis on scenarios characterized by scarcity of labeled data needed for the supervised training and on environments with limited power resources imposing miniaturization of the models. Furthermore, we describe VISIONE, our large-scale video search system designed to search extensive multimedia databases in an interactive and user-friendly manner.Source: Ital-IA 2023, pp. 538–543, Pisa, Italy, 29-31/05/2023
Project(s): AI4Media via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2023 Conference article Open Access

Vec2Doc: transforming dense vectors into sparse representations for efficient information retrieval
Carrara F., Gennaro C., Vadicamo L., Amato G.
Vec2Doc: Transforming Dense Vectors into Sparse Representations for Efficient Information RetrievalSource: SISAP 2023 - 16th International Conference on Similarity Search and Applications, pp. 215–222, A Coruña, Spain, 9-11/10/2023
DOI: 10.1007/978-3-031-46994-7_18
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2023 Journal article Open Access

Induced permutations for approximate metric search
Vadicamo L., Amato G., Gennaro C.
Permutation-based Indexing (PBI) approaches have been proven to be particularly effective for conducting large-scale approximate metric searching. These methods rely on the idea of transforming the original metric objects into permutation representations, which can be efficiently indexed using data structures such as inverted files. The standard conceptualization of permutation associated with a metric object involves only the use of object distances and their relative orders from a set of anchors called pivots. In this paper, we generalized this definition in order to enlarge the class of permutation representations that can be used by PBI approaches. In particular, we introduced the concept of permutation induced by a space transformation and a sorting function, and we investigated which properties these transformations should possess to produce permutations that are effective for metric search. Furthermore, as a practical outcome, we defined a new type of permutation representation that is calculated using distances from pairs of pivots. This proposed technique allowed us to produce longer permutations than traditional ones for the same number of object-pivot distance calculations. The advantage lies in the fact that when longer permutations are employed, the use of inverted files built on permutation prefixes leads to greater efficiency in the search phase.Source: Information systems (Oxf.) 119 (2023). doi:10.1016/j.is.2023.102286
DOI: 10.1016/j.is.2023.102286
Project(s): AI4EU via OpenAIRE

, AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | Information Systems Restricted | www.sciencedirect.com | CNR ExploRA

2023 Report Open Access

AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.Source: ISTI Annual Reports, 2023
DOI: 10.32079/isti-ar-2023/001
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2023 Conference article Open Access

VISIONE for newbies: an easier-to-use video retrieval system
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
This paper presents a revised version of the VISIONE video retrieval system, which offers a wide range of search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. The system is designed to ensure scalability using advanced indexing techniques and effectiveness using cutting-edge Artificial Intelligence technology for visual content analysis. VISIONE was the runner-up in the 2023 Video Browser Showdown competition, demonstrating its comprehensive video retrieval capabilities. In this paper, we detail the improvements made to the search and browsing interface to enhance its usability for non-expert users. A demonstration video of our system with the restyled interface, showcasing its capabilities on over 2,300 hours of diverse video content, is available online at https://youtu.be/srD3TCUkMSg.Source: CBMI 2023 - 20th International Conference on Content-based Multimedia Indexing, pp. 158–162, Orleans, France, 20-22/09/2023
DOI: 10.1145/3617233.3617261
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Journal article Open Access

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown
Heller S., Gsteiger V., Bailer W., Gurrin C., Jonsson B. T., Lokoc J., Leibetseder A., Mejzlik F., Peska L., Rossetto L., Schall K., Schoeffmann K., Schuldt H., Spiess F., Tran L. D., Vadicamo L., Vesely P., Vrochidis S., Wu J.
The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.Source: International journal of multimedia information retrieval Print 11 (2022). doi:10.1007/s13735-021-00225-2
DOI: 10.1007/s13735-021-00225-2
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ISTI Repository | link.springer.com Restricted | CNR ExploRA

2022 Conference article Open Access

MOBDrone: a drone video dataset for Man OverBoard Rescue
Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F.
Modern Unmanned Aerial Vehicles (UAV) equipped with cameras can play an essential role in speeding up the identification and rescue of people who have fallen overboard, i.e., man overboard (MOB). To this end, Artificial Intelligence techniques can be leveraged for the automatic understanding of visual data acquired from drones. However, detecting people at sea in aerial imagery is challenging primarily due to the lack of specialized annotated datasets for training and testing detectors for this task. To fill this gap, we introduce and publicly release the MOBDrone benchmark, a collection of more than 125K drone-view images in a marine environment under several conditions, such as different altitudes, camera shooting angles, and illumination. We manually annotated more than 180K objects, of which about 113K man overboard, precisely localizing them with bounding boxes. Moreover, we conduct a thorough performance analysis of several state-of-the-art object detectors on the MOBDrone data, serving as baselines for further research.Source: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 633–644, Lecce, Italia, 23-27/05/2022
DOI: 10.1007/978-3-031-06430-2_53
Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2022 Dataset Open Access

MOBDrone: a large-scale drone-view dataset for man overboard detection
Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F.
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.

See at: ISTI Repository Open Access | CNR ExploRA | zenodo.org

2022 Conference article Open Access

A task category space for user-centric comparative multimedia search evaluations
Lokoc J., Bailer W., Barthel K. U., Gurrin C., Heller S., Jónsson B. Þ., Peska L., Rossetto L., Schoeffmann K., Vadicamo L., Vrochidis S., Wu J.
In the last decade, user-centric video search competitions have facilitated the evolution of interactive video search systems. So far, these competitions focused on a small number of search task categories, with few attempts to change task category configurations. Based on our extensive experience with interactive video search contests, we have analyzed the spectrum of possible task categories and propose a list of individual axes that define a large space of possible task categories. Using this concept of category space, new user-centric video search competitions can be designed to benchmark video search systems from different perspectives. We further analyse the three task categories considered so far at the Video Browser Showdown and discuss possible (but sometimes challenging) shifts within the task category space.Source: MMM 2022 - 28th International Conference on Multi Media Modeling, pp. 193–204, Phu Quoc, Vietnam, 06-10/06/2022
DOI: 10.1007/978-3-030-98358-1_16
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2022 Conference article Open Access

VISIONE at Video Browser Showdown 2022
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
VISIONE is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). It uses a full-text search engine as a search backend. In the latest version of our system, we modified the user interface, and we made some changes to the techniques used to analyze and search for videos.Source: MMM 2022 - 28th International Conference on Multimedia Modeling, pp. 543–548, Phu Quoc, Vietnam, 06-10/06/2022
DOI: 10.1007/978-3-030-98355-0_52
Project(s): AI4EU via OpenAIRE

, AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2022 Conference article Open Access

Approximate nearest neighbor search on standard search engines
Carrara F., Vadicamo L., Gennaro C., Amato G.
Approximate search for high-dimensional vectors is commonly addressed using dedicated techniques often combined with hardware acceleration provided by GPUs, FPGAs, and other custom in-memory silicon. Despite their effectiveness, harmonizing those optimized solutions with other types of searches often poses technological difficulties. For example, to implement a combined text+image multimodal search, we are forced first to query the index of high-dimensional image descriptors and then filter the results based on the textual query or vice versa. This paper proposes a text surrogate technique to translate real-valued vectors into text and index them with a standard textual search engine such as Elasticsearch or Apache Lucene. This technique allows us to perform approximate kNN searches of high-dimensional vectors alongside classical full-text searches natively on a single textual search engine, enabling multimedia queries without sacrificing scalability. Our proposal exploits a combination of vector quantization and scalar quantization. We compared our approach to the existing literature in this field of research, demonstrating a significant improvement in performance through preliminary experimentation.Source: SISAP 2022 - 15th International Conference on Similarity Search and Applications, pp. 214–221, Bologna, Italy, 7-9/10/2022
DOI: 10.1007/978-3-031-17849-8_17
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2022 Other Open Access

COCO, LVIS, Open Images V4 classes mapping
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes. COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes. We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet.Project(s): AI4Media via OpenAIRE

See at: zenodo.org Open Access | CNR ExploRA

2022 Conference article Open Access

Investigating binary partition power in metric query
Connor R., Dearle A., Vadicamo L.
It is generally understood that, as dimensionality increases, the minimum cost of metric query tends from O(log n) to O (n) in both space and time, where n is the size of the data set. With low dimensionality, the former is easy to achieve; with very high dimensionality, the latter is inevitable. We previously described BitPart as a novel mechanism suitable for performing exact metric search in "high(er)" dimensions. The essential tradeoff of BitPart is that its space cost is linear with respect to the size of the data, but the actual space required for each object may be small as log2 n bits, which allows even very large data sets to be queried using only main memory. Potentially the time cost still scales with O (log n). Together these attributes give exact search which outperforms indexing structures if dimensionality is within a certain range. In this article, we reiterate the design of BitPart in this context. The novel contribution is an indepth examination of what the notion of "high(er)" means in practical terms. To do this we introduce the notion of exclusion power, and show its application to some generated data sets across different dimensions.Source: SEBD 2022 - 30th Italian Symposium on Advanced Database Systems, pp. 415–426, Tirrenia (PI), Italia, 19-22/06/2022

See at: ceur-ws.org Open Access | ISTI Repository | ISTI Repository | CNR ExploRA

2022 Conference article Open Access

On the expected exclusion power of binary partitions for metric search
Vadicamo L., Dearle A., Connor R.
The entire history and, we dare say, future of similarity search is governed by the underlying notion of partition. A partition is an equivalence relation defined over the space, therefore each element of the space is contained within precisely one of the equivalence classes of the partition. All attempts to search a finite space efficiently, whether exactly or approximately, rely on some set of principles which imply that if the query is within one equivalence class, then one or more other classes either cannot, or probably do not, contain any of its solutions. In most early research, partitions relied only on the metric postulates, and logarithmic search time could be obtained on low dimensional spaces. In these cases, it was straightforward to identify multiple partitions, each of which gave a relatively high probability of identifying subsets of the space which could not contain solutions. Over time the datasets being searched have become more complex, leading to higher dimensional spaces. It is now understood that even an approximate search in a very high-dimensional space is destined to require O(n) time and space. Almost entirely missing from the research literature however is any analysis of exactly when this effect takes over. In this paper, we make a start on tackling this important issue. Using a quantitative approach, we aim to shed some light on the notion of the exclusion power of partitions, in an attempt to better understand their nature with respect to increasing dimensionality.Source: SISAP 2022 - 15th International Conference on Similarity Search and Applications, pp. 104–117, Bologna, Italy, 7-9/10/2022
DOI: 10.1007/978-3-031-17849-8_9
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2022 Journal article Open Access

A leap among quantum computing and quantum neural networks: a survey
Massoli F. V., Vadicamo L., Amato G., Falchi F.
In recent years, Quantum Computing witnessed massive improvements in terms of available resources and algorithms development. The ability to harness quantum phenomena to solve computational problems is a long-standing dream that has drawn the scientific community's interest since the late 80s. In such a context, we propose our contribution. First, we introduce basic concepts related to quantum computations, and then we explain the core functionalities of technologies that implement the Gate Model and Adiabatic Quantum Computing paradigms. Finally, we gather, compare and analyze the current state-of-the-art concerning Quantum Perceptrons and Quantum Neural Networks implementations.Source: ACM computing surveys (2022). doi:10.1145/3529756
DOI: 10.1145/3529756
DOI: 10.48550/arxiv.2107.03313
Project(s): AI4EU via OpenAIRE

, AI4Media via OpenAIRE

Metrics:

See at: arXiv.org e-Print Archive Open Access | ISTI Repository | ACM Computing Surveys Restricted | doi.org | CNR ExploRA